AITopics | multi-pass stochastic gradient method

Collaborating Authors

multi-pass stochastic gradient method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Optimal Learning for Multi-pass Stochastic Gradient Methods

Neural Information Processing SystemsNov-21-2025, 15:39:00 GMT

We analyze the learning properties of the stochastic gradient method when multiple passes over the data and mini-batches are allowed. In particular, we consider the square loss and show that for a universal step-size choice, the number of passes acts as a regularization parameter, and optimal finite sample bounds can be achieved by early-stopping. Moreover, we show that larger step-sizes are allowed when considering mini-batches. Our analysis is based on a unifying approach, encompassing both batch and stochastic gradient methods as special cases.

multi-pass stochastic gradient method, name change, optimal learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.92)

Add feedback

Reviews: Optimal Learning for Multi-pass Stochastic Gradient Methods

Neural Information Processing SystemsJan-20-2025, 23:24:36 GMT

This work provides a strong contribution in that it apparently is the first work to net optimal rates (up to log factors) for SGM, and moreover, it also handles a mini-batch analysis which includes the (full) batch method as a special case. Such rates previously had been established only for the (batch) ridge regression method. My interpretation of what all the results actually show is given in the Summary. I find the current solution of relying on cross-validation for adaptation to be a bit of an inelegant cop-out (even if there is a theoretically-supported method for using it); given that several of your corollaries provide a guarantee where \zeta and \gamma enter the picture only through T *, can you provide a self-monitoring method that decides when to stop? In particular, I find the most exciting results to be Corollaries 3.3 and 3.9, as only the stopping time depends on the (unknown) capacity parameters, and so such an online stopping mechanism might be possible.

capacity parameter, multi-pass stochastic gradient method, optimal learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.58)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)

Add feedback

Optimal Learning for Multi-pass Stochastic Gradient Methods

Lin, Junhong, Rosasco, Lorenzo

Neural Information Processing SystemsFeb-14-2020, 16:12:38 GMT

multi-pass stochastic gradient method, optimal learning

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.98)

Add feedback